Cost-Effective Software Based Fault-Tolerant Routing in Pipelined Networks*
نویسندگان
چکیده
This paper presents a software based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from the network by the local router and delivered to the messaging layer of the local node’s operating system. The message passing software can re-route this message, possibly along non-minimal paths. Alternatively the message may be addressed to an intermediate node, which will forward the message to the destination. A message may encounter multiple faults and pass through multiple intermediate nodes. The proposed techniques are applicable to both obliviously and adaptively routed networks. The techniques are specifically targeted towards commercial multiprocessors where the mean time to repair (MTTR) is much smaller than the mean time between router failures (MTBF), i.e., it is sufficient to tolerate a maximum of 2-3 failures. This paper presents requirements for buffer management, deadlock freedom and livelock freedom. Simulation results are presented to evaluate the degradation in latency and throughput as a function of the number and distribution of faults. There are several advantages of such an approach. Router designs are minimally impacted, and thus remain compact and fast. Only messages that encounter faulty components are affected, while the machine is ensured of continued operation until the faulty components can be replaced. The technique leverages existing network technology, and is a good candidate for incorporation into the next generation of multiprocessor networks.
منابع مشابه
Tree-Based Fault-Tolerant Multicast in Multicomputer Networks Using Pipelined Circuit Switching
A tree-based fault-tolerant multicast algorithm built on top of pipelined circuit switching is presented. For every multicast message, a multicast tree is constructed in a distributed and adaptive fashion. An underlying fault-tolerant routing algorithm is used to tolerate faulty nodes and links without requiring nodes to have global fault information. The multicast algorithm is provably deadloc...
متن کاملCAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip
By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...
متن کاملSoftware Based Fault-Tolerant Oblivious Routing in Pipelined Networks
This paper presents a software based approach to fault-tolerant routing in oblivious, wormhole routed networks. When a message encounters a faulty output link it is removed from the network by the local router and delivered to the messaging layer of the local node’s operating system. The message passing software can re-route this message along a non-minimal oblivious path or via an intermediate...
متن کاملA New Adaptive Fault-Tolerant Protocol for Direct Multiprocessors Networks
This paper investigates the fault tolerance problem in direct networks. Conservative flow control mechanisms such as Pipelined Circuit Switching (PCS) ensure the existence of a path to the destination before transmission. This ensures achieving reliable fault-tolerant system on the expense of performance. Optimistic flow control mechanisms such as Wormhole Switching (WS) realize very good perfo...
متن کاملFault Tolerant Routing in Tri-Sector Wireless Cellular Mesh Networks
Multi-hop Wireless mesh networks (WMNs) are emerging as a viable alternative solution for cost effective access networks. WMNs can fill the capacity and coverage limitation of traditional cellular, point-to-point wireless systems. In contrast to traditional cellular system, WMNs need only one access point to the wired network, while other access points share a connection over the air. Wireless ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007